许多基于模型的强化学习方法(MBRL)为他们可以提供的马尔可夫决策过程(MDP)模型的准确性和学习效率提供了保证。同时,状态抽象技术允许减少MDP的大小,同时相对于原始问题保持有限的损失。因此,令人惊讶的是,在结合两种技术时,即MBRL仅观察抽象状态时,没有任何保证可用。我们的理论分析表明,抽象可以在网上收集的样本(例如在现实世界中)引入依赖性,这意味着MBRL的大多数结果不能直接扩展到此设置。这项工作的新结果表明,可以使用Martingales的浓度不平等来克服此问题,并允许将R-MAX等算法的结果扩展到以抽象为设置的算法。因此,通过抽象的模型为抽象的RL生成了第一个性能保证:基于模型的强化学习。
translated by 谷歌翻译
由于其样本的复杂性很高,截至今天,模拟对于成功应用增强学习至关重要。然而,许多现实世界中的问题都表现出过度复杂的动力学,这使其全尺度模拟在计算上很慢。在本文中,我们展示了如何将许多代理的大型网络系统分解为多个局部组件,以便我们可以构建独立和并行运行的单独模拟器。为了监视不同局部组件彼此施加的影响,这些模拟器中的每个模拟器都配备了一个经过定期训练实际轨迹的模型。我们的经验结果表明,在不同的过程之间分配仿真不仅可以在短短几个小时内训练大型多机构系统,还可以帮助减轻同时学习的负面影响。
translated by 谷歌翻译
当这些代理商还具有适应我们自己的行为的能力时,学会与其他代理商合作是具有挑战性的。在合作环境中学习的实用和理论方法通常假定其他代理人的行为是静止的,或者对其他代理人的学习过程做出了非常具体的假设。这项工作的目的是了解我们是否可以在没有这种限制性假设的情况下可靠地学会与其他代理商合作,而这些假设不太可能在现实世界应用中保留。我们的主要贡献是一组不可能的结果,这表明没有学习算法可以可靠地学习与重复的矩阵游戏中所有可能的自适应伙伴合作,即使该合作伙伴可以通过某种固定策略合作。在这些结果的激励下,我们讨论了潜在的替代假设,这些假设捕捉了自适应伴侣只能理性地适应我们的行为的想法。
translated by 谷歌翻译
How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively decides which simulator to use for every simulation, based on a statistic that measures the accuracy of the approximate simulator. This allows us to use the approximate simulator to replace the original simulator for faster simulations when it is accurate enough under the current context, thus trading off simulation speed and accuracy. Experimental results in two large domains show that when integrated with POMCP, our approach allows to plan with improving efficiency over time.
translated by 谷歌翻译
从演示和成对偏好推断奖励功能是对准与人类意图的强化学习(RL)代理的吉祥方法。然而,最先进的方法通常专注于学习单一奖励模型,从而使得难以从多个专家兑换不同的奖励功能。我们提出了多目标加强主动学习(道德),这是一种将社会规范多样化示范与帕累托最优政策相结合的新方法。通过维持分布在标量化权重,我们的方法能够以各种偏好交互地调整深度RL代理,同时消除了计算多个策略的需求。我们经验展示了道德在两种情景中的有效性,该方案模拟了需要代理人在规范冲突的情况下采取行动的交付和紧急任务。总体而言,我们认为我们的研究迈出了多目标RL的一步,具有学习奖励,弥合当前奖励学习和机器伦理文学之间的差距。
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译